-
-
Notifications
You must be signed in to change notification settings - Fork 71
Add Smithsonian process and report script #278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
TimidRobot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to see this coming together! Acknowledging this is a WIP, you'll need to update labels and descriptions to provide context and make the data more meaningful.
It might also be worth looking to see if there's any way to automatically get the full names for units. I have no idea what the abbreviations mean.
cb5294c to
15720d1
Compare
TimidRobot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is coming along nicely, keep up the good work!
| ] | ||
| QUARTER = os.path.basename(PATHS["data_quarter"]) | ||
|
|
||
| unit_map = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment indicating how this data was compiled. Something like "Manually compiled from information on URL".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, does the source(s) have information on:
2026-02-09 16:31:34,002 - WARNING - smithsonian_fetch - New unit code(s) not in unit_map: ['ACAH', 'CHSDM', 'EEPA', 'FSA', 'NAA', 'NASMAC', 'NMAIA', 'NPMA', 'SAAMPAIK', 'SI']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, constants should be ALL CAPS (UNIT_MAP)
| "TOTAL_OBJECTS", | ||
| ] | ||
| HEADER_2_UNITS = [ | ||
| "UNIT", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to store both unit code and unit names since the unit names are not available from the API.
| "Overview", | ||
| None, | ||
| None, | ||
| "The Smithsonian data returns the overall " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove space at end of string
| f"The results indicate a total record of {total_objects} objects," | ||
| f" with a breakdown of {CC0_records} objects without CC0 Media and" | ||
| f" {CC0_records_with_media} objects with CC0 Media, taking a" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please format numbers with commas to improve readability.
| " representing museums, libraries, zoos and many other" | ||
| f" with a minimum of {min_unit} objects.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
min_unitsseems to be misleading. Shouldn't it be something likemin_objects?- How does a "a minimum of 147 objects" relate to the 3rd plot that shows Anacostia Community Museum Archives with 57 works?
- Please update wording:
and many other➡️ and other institutions
| "Plots showing totals by units.", | ||
| "This shows the distribution of top 10" | ||
| " units/ sub providers across smithsonian" | ||
| f" with an average of {average_unit} objects" | ||
| " across the top 10 sub providers.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please hard wrap at 79 instead of 53
- Please define "NMNH"
- Please format number with commas to improve readability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- First use of "Smithsonian" in report should be "Smithsonian Institute"
- I think the information presented to the user should replace "units" and "sub providers" with "institute members"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The institution member (units) plots are dominated by the member names:
I propose a function be added to this shared library that will insert a newline roughly halfway through the name (replacing a space).
Alternatively, the abbreviations could be used and the definitions of those abbreviations could be added as text below the plot.
I'm open to other ideas, as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I would like to test the first one to see how it looks like.
| ) | ||
|
|
||
|
|
||
| def plot_totals_by_records(args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new plot is interesting!
Please:
- populate "bar x scale" value (I expect it should be linear instead of None)
- see note about label/name length
- indicate why the 10 records are shown (ex. 10 with hightest cc0 media percentage)
- key shouldn't obscure data


Description
Checklist
Update index.md).mainormaster).visible errors.
Developer Certificate of Origin
For the purposes of this DCO, "license" is equivalent to "license or public domain dedication," and "open source license" is equivalent to "open content license or public domain dedication."
Developer Certificate of Origin